Diffusion Models#
Run this webpage in Google Colab using the icon.
This lecture notes uses packages including pytorch, torchvision. Use Google Colab icon to run the code. Pytorch does not work with live code yet.
1. Motivation and Recap#
VAEs and GANs both aim to model complex data distributions.
VAEs: reconstruct well but can produce blurry samples due to simplified likelihood assumptions.
GANs: generate sharp samples but training can be unstable (mode collapse, convergence issues).
Diffusion Models offer a new paradigm:
They learn to generate data by gradually denoising a noisy input, reversing a diffusion (noise) process.
State-of-the-art in generative modeling, powering modern tools like Stable Diffusion and DALL·E 2.
2. Diffusion Models: Core Idea#
A diffusion model defines two processes:
Forward Diffusion Process (noising):
Gradually adds Gaussian noise to data \( \mathbf{x}_0 \) over \( T \) steps until it becomes nearly pure noise \( \mathbf{x}_T \).Reverse Diffusion Process (denoising):
Learns to reverse the noising step-by-step, generating realistic samples from noise.
3. Forward Diffusion Process#
We start with data sample \( \mathbf{x}_0 \sim q(\mathbf{x}_0) \).
Noise is added in small steps controlled by a variance schedule \( \{\beta_t\}_{t=1}^T \), where \( \beta_t \in (0,1) \).
3.1 Single Step#
3.2 Closed-Form Sampling#
We can directly sample \( \mathbf{x}_t \) at step \( t \) from \( \mathbf{x}_0 \):
where:
and noise injection is:
4. Reverse Diffusion Process#
We want to reverse the noising process:
The reverse conditionals are also Gaussian but unknown.
A neural network (usually a U-Net) is trained to predict the mean (and sometimes variance).
In practice, instead of directly predicting \( \mu \), we train the network to predict the noise \( \boldsymbol{\epsilon} \) added at each step.
5. Diffusion Model Training Objective#
Using variational inference, the training objective reduces to:
Here, \( \boldsymbol{\epsilon} \) is the true Gaussian noise,
\( \boldsymbol{\epsilon}_{\theta} \) is the predicted noise by the network.
Thus, the network learns to denoise step by step.
6. Comparison: VAEs vs Diffusion Models#
VAE Loss:
Diffusion Loss:
Diffusion models avoid blurry reconstructions by directly learning the data distribution via noise prediction.
7. Conditional Diffusion Models#
Diffusion can be conditioned to control generation:
Classifier Guidance:
Train a separate classifier \( p(y \mid \mathbf{x}_t) \) to guide sampling.Classifier-Free Guidance:
Train a single model with both conditional and unconditional objectives.
During sampling, interpolate between unconditional and conditional predictions.
These allow controlled image generation (e.g., “generate a cat” or “generate digit 5”).
8. Latent Diffusion Models (LDMs)#
Pixel-space diffusion is computationally expensive.
Latent Diffusion Models operate in a compressed latent space learned by an autoencoder:
Train a VAE/autoencoder to map data \( \mathbf{x} \) into latent space \( \mathbf{z} \).
Perform diffusion in latent space (cheaper, faster).
Decode final denoised latent back to image.
8.1 Cross-Attention in LDMs#
Enables conditioning on text prompts (as in Stable Diffusion).
Attention layers integrate semantic information into the denoising process.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from scipy.stats import kstest
from IPython.display import HTML
# === Parameters ===
np.random.seed(42)
n_samples = 10000 # Number of data points
T = 200 # Number of diffusion steps
beta_start = 1e-4 # Starting noise level
beta_end = 0.1 # Ending noise level
# === Initial Data: piecewise uniform with gaps ===
n1 = n_samples // 3
n2 = n_samples // 3
n3 = n_samples - n1 - n2
initial_data = np.concatenate([
np.random.uniform(-10, -6, size=n1),
np.random.uniform(-2, 2, size=n2),
np.random.uniform( 6, 10, size=n3),
])
# === Noise schedule ===
beta = np.linspace(beta_start, beta_end, T)
alpha = 1.0 - beta
bar_alpha = np.cumprod(alpha)
# === Precompute forward diffusion steps ===
diffusion_steps = [initial_data.copy()]
for t in range(T):
x_prev = diffusion_steps[-1]
noise = np.random.randn(n_samples)
x_next = np.sqrt(alpha[t]) * x_prev + np.sqrt(beta[t]) * noise
diffusion_steps.append(x_next)
# === Test normality at each step (KS test) ===
normality_p = []
for data in diffusion_steps:
m, s = data.mean(), data.std()
_, pvalue = kstest(data, 'norm', args=(m, s))
normality_p.append(pvalue)
# === Set up histogram bins & Gaussian PDF ===
bins = np.linspace(-12, 12, 60)
bin_centers = 0.5 * (bins[:-1] + bins[1:])
x_grid = np.linspace(-12, 12, 500)
gauss_pdf = np.exp(-0.5 * x_grid**2) / np.sqrt(2 * np.pi)
# === Create figure & initial plot ===
fig, ax = plt.subplots(figsize=(8, 5))
ax.set_xlim(-12, 12)
ax.set_ylim(0, 0.45)
ax.set_xlabel("Value")
ax.set_ylabel("Density")
ax.grid(True)
# initial histogram
hist_vals, _ = np.histogram(initial_data, bins=bins, density=True)
bars = ax.bar(bin_centers, hist_vals, width=bins[1]-bins[0], alpha=0.6, color='orange')
# overlay final Gaussian curve
line_pdf, = ax.plot(x_grid, gauss_pdf, 'r--', lw=2, label='Standard Gaussian')
ax.legend(loc='upper right')
# equation text (constant)
equation_text = ax.text(
0.5, 1.08,
r"$x_t = \sqrt{1-\beta_t}\,x_{t-1} + \sqrt{\beta_t}\,\epsilon,\quad \epsilon\sim\mathcal{N}(0,I)$",
transform=ax.transAxes, ha="center", va="bottom", fontsize=12
)
# subtitle text (updates each frame)
subtitle_text = ax.text(
0.5, 1.02,
"", transform=ax.transAxes, ha="center", va="bottom", fontsize=10
)
# === Animation update function ===
def update(frame):
data = diffusion_steps[frame]
hist_vals, _ = np.histogram(data, bins=bins, density=True)
for bar, h in zip(bars, hist_vals):
bar.set_height(h)
pval = normality_p[frame]
subtitle_text.set_text(f"Step {frame}/{T} | KS p-value = {pval:.3f}; close to 1 → Gaussian; close to 0 → Not Gaussian")
return (*bars, subtitle_text)
# === Create Animation ===
ani = FuncAnimation(
fig, update,
frames=len(diffusion_steps),
interval=50,
blit=True
)
# prevent static plot from showing
plt.close(fig)
# Display in Jupyter
HTML(ani.to_jshtml())
import numpy as np
import matplotlib.pyplot as plt
# === Parameters ===
np.random.seed(42)
n_samples = 10000 # Number of data points
T = 200 # Number of diffusion steps
beta_start = 1e-4 # Starting noise level
beta_end = 0.1 # Ending noise level
# === Initial Data: piecewise uniform with gaps ===
n1 = n_samples // 3
n2 = n_samples // 3
n3 = n_samples - n1 - n2
initial_data = np.concatenate([
np.random.uniform(-10, -6, size=n1),
np.random.uniform(-2, 2, size=n2),
np.random.uniform( 6, 10, size=n3),
])
# === Define Beta Schedules ===
schedules = {
'Linear': np.linspace(beta_start, beta_end, T),
'Quadratic': np.linspace(np.sqrt(beta_start), np.sqrt(beta_end), T)**2,
'Constant': np.full(T, beta_end)
}
# === Timesteps to visualize ===
timesteps = [0, T//2, T] # start, mid, end
# === Histogram bins ===
bins = np.linspace(-12, 12, 60)
# === Standard normal PDF for overlay ===
x_grid = np.linspace(-12, 12, 500)
gauss_pdf = np.exp(-0.5 * x_grid**2) / np.sqrt(2 * np.pi)
# === Colors matching the screenshot style ===
bar_color = "#F3B762"
edge_color = "#6D4301"
gauss_color = "r"
# === Create subplots: add extra col for beta curves ===
fig, axes = plt.subplots(
nrows=len(schedules),
ncols=len(timesteps) + 1,
figsize=(15, 8),
sharey=True
)
# === Plot Beta schedules column (leftmost) ===
for i, (name, beta) in enumerate(schedules.items()):
ax = axes[i, 0]
ax.plot(range(1, T+1), beta, color="C0")
ax.set_xlim(0, T)
ax.set_title(f"{name}\nBeta Schedule")
if i == len(schedules)-1:
ax.set_xlabel("Diffusion Step")
ax.set_ylabel("Beta Value")
# === Plot histograms for each schedule/timestep ===
for i, (name, beta) in enumerate(schedules.items()):
# Simulate forward diffusion for this beta schedule
diffusion = [initial_data.copy()]
for t in range(T):
x_prev = diffusion[-1]
noise = np.random.randn(n_samples)
x_next = np.sqrt(1 - beta[t]) * x_prev + np.sqrt(beta[t]) * noise
diffusion.append(x_next)
for j, t in enumerate(timesteps):
ax = axes[i, j+1] # +1 to account for leftmost beta plot
data = diffusion[t]
# Histogram (not line plot)
hist = ax.hist(data, bins=bins, density=True,
color=bar_color, edgecolor=edge_color, alpha=0.85)
# At final timestep overlay Gaussian
if t == T:
ax.plot(x_grid, gauss_pdf, gauss_color+"--", lw=2, label="Std Gaussian")
ax.legend()
ax.set_xlim(-12, 12)
ax.set_title(f"{name}\nstep {t}")
if j == 0:
ax.set_ylabel("Density")
if i == len(schedules)-1:
ax.set_xlabel("Value")
# Only set ylim for the very first histogram (top-left only)
if i == 0 and j == 0:
ax.set_ylim(0, 0.2)
plt.suptitle("Effect of Beta Schedule on Forward Diffusion", y=1.03, fontsize=16)
plt.tight_layout()
plt.show()
9. Summary#
Diffusion models add noise and then learn to reverse the process.
Forward process: gradually adds Gaussian noise.
Reverse process: neural network denoises step by step.
Training objective: predict noise at each step (MSE).
Conditional variants: classifier-guided or classifier-free.
Latent diffusion: runs in compressed space, enabling large-scale applications (e.g., text-to-image).
Diffusion models are now the state-of-the-art generative models, surpassing VAEs and GANs in many domains.